perf: use hybrid sort for inline object order by He-Pin · Pull Request #855 · databricks/sjsonnet

He-Pin · 2026-05-13T11:58:32Z

Motivation

computeSortedInlineOrder was originally tuned for inline objects with a
handful of fields. Once strict JSON imports started constructing inline
Val.Objs from byte-parsed JSON, the wider key counts of imported objects
(kube-prometheus and similar configs) turned the existing insertion sort
into a quadratic hot spot.

A repeated kube-prometheus materialization sample showed
Materializer.computeSortedInlineOrder as a real Scala-Native top-stack
sample. This PR keeps the small-object fast path and breaks the quadratic
behaviour for wider objects.

Modification

Materializer.computeSortedInlineOrder delegates to a new
sortInlineOrder dispatch:
- len ≤ 1: return.
- len ≤ 16: existing insertion sort over the index array.
- len > 16: in-place quicksort with median-of-three pivot, falling back
  to insertion sort once partitions reach ≤ 16. Recurses on the
  smaller half (Sedgewick) so stack depth is O(log n).
Sorting still uses Util.compareStringsByCodepoint — Jsonnet key ordering
semantics are unchanged.
Only a fresh Array[Int] is mutated; shared parsed keys/members are not
touched.

Result

Re-benched on 2026-05-21 against master @ b252b184. Apple Silicon, JDK 21,
Scala 3.3.7.

Allocation (JMH `-prof gc`, full bench corpus)

In-place sort, so allocation is unchanged. Every bench is within
±0.3% B/op except manifestJsonEx at -1.70% (which is genuine —
the smaller index-array path skips the temporary key list the old
helper kept producing on this shape). No bench shows an alloc
regression > +0.3% / +250 B.

Wall-clock — Scala-Native release binary (hyperfine)

Selected object-construction-shape benches (warmup=2, min-runs=5):

bench                                  master ms     this PR ms      Δ
cpp_suite/realistic2.jsonnet           87.38 ± 1.51  86.50 ± 1.64   -1.00%
cpp_suite/bench.02.jsonnet             60.78 ± 1.32  59.83 ± 1.44   -1.56%
sjsonnet_suite/lazy_array_compr.       92.25 ± 3.25  92.05 ± 2.51   -0.22%
cpp_suite/realistic1.jsonnet           10.70 ± 1.18  10.68 ± 1.17   -0.19%
cpp_suite/gen_big_object.jsonnet       10.04 ± 1.19  10.45 ± 2.85   +4.09%
cpp_suite/large_string_template.jsonnet 13.40 ± 4.86 10.89 ± 1.20  -18.71%
go_suite/manifestJsonEx.jsonnet         6.79 ± 1.28   6.19 ± 1.02   -8.78%
go_suite/manifestTomlEx.jsonnet         6.18 ± 1.17   5.78 ± 1.05   -6.48%

Bench corpus impact is largely wall-clock-neutral: most short-running
benches (< 30 ms) are dominated by Native start-up variance (±10–15 %
run-to-run). The targeted win — wide inline JSON objects from imports —
is not represented in the bench corpus; the original kube-prometheus
profile is where the change pays off most.

No corpus-level regression > 5 % outside Native start-up noise.

Correctness

RendererTests and JsonImportFastPathTests pass (13 + 7 cases).
./mill 'sjsonnet.jvm[3.3.7]'.test — green.
./mill __.checkFormat — green.

Hybrid sort is correct: insertion sort on small partitions matches the
existing implementation; quicksort uses Hoare partition with
median-of-three pivot and tail-recursion on the smaller half (worst-case
stack O(log n)); object keys are unique so stability is not
required.

Test plan

./mill 'sjsonnet.jvm[3.3.7]'.test — green
./mill __.checkFormat — green

Motivation: Large inline objects produced by strict JSON imports can exceed the small-object shape that computeSortedInlineOrder was originally tuned for. Native sampling on kube-prometheus showed sorted inline-order computation as a materialization hotspot, and insertion sort becomes quadratic on those wider objects. Modification: Keep insertion sort for small inline objects, and use an in-place quicksort with median-of-three pivot and insertion-sort cleanup for larger visible field sets. Result: Kube-prometheus Native A/B improved on top of strict JSON byte imports, with forward mean 145.3ms -> 140.0ms and reverse mean 151.6ms -> 148.9ms. Formatting and the full test suite pass. References: Upstream-base: databricks/sjsonnet@cedc083 Prior optimization: 883fca5 perf: parse strict JSON imports from bytes

He-Pin marked this pull request as ready for review May 13, 2026 12:26

He-Pin marked this pull request as draft May 13, 2026 12:26

He-Pin mentioned this pull request May 15, 2026

perf: cache repeated long string rendering #857

Closed

He-Pin force-pushed the perf/hybrid-inline-sort-order branch 3 times, most recently from 0848e79 to 4e987c7 Compare May 20, 2026 18:18

He-Pin marked this pull request as ready for review May 21, 2026 02:42

He-Pin force-pushed the perf/hybrid-inline-sort-order branch from ef717de to 5332110 Compare May 21, 2026 02:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use hybrid sort for inline object order#855

perf: use hybrid sort for inline object order#855
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/hybrid-inline-sort-order

He-Pin commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Result

Allocation (JMH -prof gc, full bench corpus)

Wall-clock — Scala-Native release binary (hyperfine)

Correctness

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented May 13, 2026 •

edited

Loading

Allocation (JMH `-prof gc`, full bench corpus)